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ABSTRACT 

The U1 small nuclear ribonucleoprotein (snRNP) 
plays pivotal roles in pre-mRNA splicing and in 
regulating mRNA length and isoform expression; 
however, the mechanism of U1 snRNA quality 
control remains undetermined. Here, we describe a 
novel surveillance pathway for U1 snRNP biogen- 
esis. Mass spectrometry-based RNA analysis 
showed that a small population of SMN complexes 
contains truncated forms of U1 snRNA (U1-tfs) 
lacking the Sm-binding site and stem loop 4 but 
containing a 7-monomethylguanosine 5' cap and a 
methylated first adenosine base. U1-tfs form a 
unique SMN complex, are shunted to processing 
bodies and have a turnover rate faster than that of 
mature U1 snRNA. U1-tfs are formed partly from the 
transcripts of U1 genes and partly from those 
lacking the 3' box elements or having defective 
SL4 coding regions. We propose that U1 snRNP bio- 
genesis is under strict quality control: U1 transcripts 
are surveyed at the 3' -terminal region and U1-tfs are 
diverted from the normal U1 snRNP biogenesis 
pathway. 

INTRODUCTION 

Ribonucleoproteins (RNPs) are a class of RNA-protein 
complexes that facilitate many cellular processes. One of 



the most prominent examples is pre-mRNA splicing, 
which is driven by the spliceosome. The major spliceosomal 
components are small nuclear RNPs (snRNPs), each of 
which consists of an snRNA (Ul, U2, U4/U6 or U5), a 
common heptameric ring of Sm proteins (B/B', Dl, D2, 
D3, E, F and G) assembled around the snRNA's Sm- 
binding site, and several proteins that are unique to each 
specific snRNP; for instance, the proteins for Ul snRNP 
are Ul-A, Ul-C and U1-70K (1). Assembly of Sm proteins 
on an snRNA is a key step in snRNP biogenesis that takes 
place in the cytoplasm shortly after the nuclear export of 
nascent snRNA precursors (pre-snRNAs). Proper assembly 
of the Sm proteins, 5' cap hypermethylation and 3' end 
processing of the snRNAs are prerequisites for the subse- 
quent import of snRNPs into the nucleus (1^). 

The remarkable assembly of the seven Sm proteins on 
the snRNA (5,6) is carried out by a complex containing 
SMN, a product of SMNl that is mutated in the neuro- 
muscular disease spinal muscular atrophy (7). The SMN 
complex contains eight proteins: Gemins 2 (SIPl), 3 (a 
DEAD-box RNA helicase), 4, 5 [a tryptophan-aspartic 
acid (WD)-repeat protein], 6, 7, 8 and Unrip (unr inter- 
acting protein) (8,9). Importantly, SMN prevents unpro- 
ductive associations between Sm proteins and RNAs 
(10-12). Among the components of the SMN complex, 
Gemin5 determines the specificity for snRNAs; for Ul 
snRNA, Gemin5 binds pre-Ul snRNA at both the loop 
region of stem-loop (SL) 1 and the SL4 region (5) directly 
on its own via its WD-repeat domain (13) and delivers pre- 
snRNAs to sites of Sm core assembly and processing. On 
the other hand, Gemin2 binds a pentamer of Sm proteins 
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containing SmDl, SmD2, SmE, SmF and SmG (14-16). 
Gemin2 interacts with all five Sm proteins, and its 
extended conformation enables it to wrap around the 
entire crescent-shaped pentamer. This prevents the Sm 
pentamer from assembhng on unintended RNAs (12,17). 
To allow pre-snRNA binding, the N-terminal region of 
Gemin2 must be displaced from the Sm pentamer's 
RNA-binding pocket; the mechanistic details of this 
process, however, remain unclear. Finally, two additional 
Sm proteins, SmB/B' and SmD3, associate with the Sm 
pentamer, presumably through direct interaction with 
SMN (18-23), in a process involving Gemins 3, 4, 6, 7, 8 
and Unrip (4,24-28). 

In Drosophila, the assembly of Sm proteins on U 
snRNAs takes place in cytoplasmic U-bodies that invari- 
ably associate with processing bodies (P-bodies), which 
function in RNA surveillance and turnover (29). Several 
Hues of evidence suggest that SMN is required for the 
functional integrity of the U-body-P-body pathway, 
which is important for maintaining proper nuclear archi- 
tecture in germline cells (30). However, how the functional 
integrity of the U-body-P-body pathway is related 
to snRNP biogenesis and how aberrant snRNPs are 
discriminated from normal snRNPs during their biogen- 
esis remain unknown. 

Here we used a mass spectrometry (MS)-based method 
to directly identify cellular RNAs in an unbiased manner. 
This high-throughput approach has previously yielded in- 
formation regarding RNA modifications (31-34), and its 
use aUowed us to identify novel truncated forms of Ul 
snRNA (Ul-tfs) that have a 5'-monomethylguanosine 
(m^G) cap and base methylation at the first transcribed 
nucleotide but lack the Sm-binding site and SL4 region. 
Ul-tfs are generated de novo; form complexes with the 
phosphorylated adaptor for nuclear export (PHAX), 
SMN, Gemins 2, 3, 4, 5, 6 and 8; and are mostly localized 
in P-bodies. We show that failure of Sm protein loading 
on Ul-tfs is responsible for the previously observed local- 
ization of the SMN complex in P-bodies (29,30). 

MATERIALS AND METHODS 

MS-based RNA analysis 

RNAs were separated on a Develosil C30-UG-2 column 
(3 |.im, 2.0mm i.d., 100-mm long, Nomura Chemical) 
at 60°C by gradient elution at a flow rate of 50|il/niin. 
The solvents were A, 400 mM hexafluoroisopropanol/ 
triethylamine (pH 7.0) and B, 100% methanol. After 
applying an RNA mixture, the column was eluted with 
10% B, foUowed by a gradient to 23% B over 1 min 
and then to 28% B over 60 min. RNAs were detected by 
monitoring A260 and, where necessary, the fractionated 
RNA was digested with RNase Tl (10 ng) in 10 mM 
sodium acetate (pH 5.3) at 37°C for 60 min. The resulting 
oligonucleotide mixture was then analyzed by the nano- 
flow LC-tandem (MS/MS) system (32,33) equipped with a 
genome-oriented database search engine Ariadne (31). We 
used the genome database of Homo sapiens (reference 
assembly version GRCh37 obtained from ftp://ftp.ncbi. 
nlm.nih.gov/genomes/H_sapiens/) under the following 



parameters: maximum number of missed cleavages, 
0; variable modification parameter, one methylation per 
RNA fragment for any residue; RNA mass tolerance, 
± 20 ppm; and MS/MS tolerance, ± 750 ppm. 

Cloning and construction of plasmids for exogenous 
expression of Ul snRNA or its truncated mutants 

To generate plasmids to exogenously express Ul snRNA, 
we first amplified the region encoding human U 1 (chromo- 
some 1, gene ID 26871) and flanking regions from human 
genomic DNA extracted from HEK293 cells for use as the 
PCR template with the primer set 5'-GAAGGATCCGTT 
TCTTTTGTAATCCGAAACA-3' and 5'-CAACTCGAG 
CTCTATGAGGTGAGAACACACT-3'. The amplified 
DNA fragment was digested with BamH I/Xho I and 
hgated into the corresponding sites of pcDNA3.1. After 
verifying the sequence of the Ul gene-containing DNA 
fragment, it was excised with BamH 1/Xho 1 and ligated 
into pcDNA3.1 (pcDdCMV-Ul; WT); the cytomegalo- 
virus (CMV) promoter was removed with Bgl Il/Xho I, 
thereby avoiding CMV promoter-based transcription. 

To construct Ul snRNA-expressing plasmids lacking 
the Sm or Sm-SL4 region or the cis-acting DNA 
elements DSE, PSE or 3'box (ASm, ASmSL4, ADSE, 
APSE, A3'box), PCR was done using WT vector as 
template and the primer set 5'-TATGCAGTCGAGTTT 
CCCACATTTG-3' and 5'-TAGTGGGGGACTGCGTT 
CG-3' for Asm, 5'-TATGCAGTCGAGTTTCCCACAT 
TTG-3' and 5'-ACTTTCTGGAGTTTCAAAAACAGA 
C-3' for AsmSL4, 5'-CTGTCCGTGATGTCACCG 
ACAG-3' and 5'-GGCAGCGCAGAGGCTGCTG-3' 
for ADSE, 5'-GCCCCGCGCACTCCCGAG-3' and 
5'-GAGTGAGGCGTATGAGGCTGTGTC-3' for 
APSE or 5'-TCCAGAAAGTCAGGGGAAAGC-3' and 
5'-CCGTACGCCAAGGGTCATGTC-3' for A3'box. 
The PCR products having blunt ends were phospho- 
rylated by T4 polynucleotide kinase and then self- 
hgated. Deletion of a specific sequence element in each 
construct was verified by DNA sequencing. 

To generate the RNA tag-fused Ul snRNA expression 
vector, the Ul coding sequence was PCR amplified from 
WT Ul template using the primer set 5'-GGAATCGATA 
TACTTACCTGGCAGGGGAG-3' and 5'-CAACTCGA 
GCTCTATGAGGTGAGAACACACT-3' to add a Cla I 
site near the 5'end of Ul snRNA. The amplified DNA 
fragment encoding Ul snRNA was cleaved with Cla 
I/Xho I. 

The yl8Sn tag was prepared using the oligonucleotides 
5'-GGAAGATCTCATACTTACCTGCGAGGATTCAG 
GCTTTGGATCGATGGA-3' and 5'-TCCATCGATCC 
AAAGCCTGAATCCTCGCAGGTAAGTATGAGATC 
TTCC-3, which were annealed and cleaved with Bgl Il/Cla 
I. In the former primer sequence, the first 1 1 nucleotides of 
Ul snRNA (ATACTTACCTG) were included just before 
the 5'end of the aptamer tag sequence. We used the 
Bgl II site present in the region just before the Ul 
coding sequence within the Ul gene sequence for con- 
struction. This was done based on a report that at least 
the first nine nucleotides of Ul snRNA are required for 
recognition of canonical 5' pre-mRNA sphce site (35). 
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On the other hand, RAT tag (36) was synthesized using 
the primer set 5'-GGAAGATCTCATACTTACCTGTAA 
GGAGTTTATATGGAAACCCTTAGACGTCGGCAC 
GAGGTTTAG-3' and 5'-TCCATCGATGGCACGAGT 
GTAGCTAAACCTCGTGCCGACGTC-3', which were 
annealed and ampUfied by PGR. The amplified DNA 
fragment was cut with Bgl Il/Cla I. The DNA fragments 
encoding Ul snRNA prepared with Cla I/Xho I and that 
encoding yl8Sn or RAT tag prepared with Bgl Il/Cla I 
were ligated into the Bgl Il/Xho I site of WT vector. 
The vector expressing yl8Sn-Ul snRNA was, named, 
yl8Sn-WT, and that expressing RAT-Ul snRNA was 
named RAT-WT. All constructs were verified by DNA 
sequencing. 

RNA aptamer-based affinity purification 

To isolate RNPs, we used the RAT tag-based method (36) 
with modifications. Briefly, instead of using purified 
recombinant PP7 coat protein (PP7CP), we cotransfected 
two plasmids (total 30|.ig) into 293T cells (5 x 10^ in 
DMEM) using the calcium phosphate method; one 
plasmid expressed RAT-tagged RNA and another ex- 
pressed HA-FLAG (HF)-tagged PP7CP (pcDNA3.1- 
PP7CP-HF) in a 7:1 ratio. At 24 h post-transfection, 
cells were collected and rinsed with phosphate-buffered 
saline-calciuni/magnesium-free [PBS(-)] once, lysed in 
PlOO lysis buffer [50 mM Tris-HCl (pH 8.0), 100 mM 
KCl, 5mM MgCl2, 0.5% IGEPAL-CA630, 1 mM 
PMSF] and incubated on ice for 30niin. The soluble 
fraction, prepared by centrifugation at 20000^ for 
30min at 4°C, was mixed with anti-FLAG-M2-conjugated 
agarose beads and rotated for 3h at 4°C. PP7CP-HF 
bound to RAT-tagged RNA-protein complexes was 
captured by the beads. After washing five times with 
PlOO lysis buffer and once with PlOO lysis buffer without 
1GEPAL-CA630, RAT-tagged RNA-protein complexes 
were eluted from the beads with 500|ig/ml FLAG 
peptide in the same buffer. 

Fluorescence in situ hybridization 

Fluorescence in situ hybridization (FISH) was carried out 
as described by Rouquette et al. (37). Briefly, HeLa or 
293T cehs were seeded on collagen-coated culture 
shdes, fixed with 4% paraformaldehyde in PBS, washed 
twice with PBS and permeabilized for 16 h at 4°C in 70% 
ethanol. After washing two times with 2x SSC containing 
10% formamide to rehydrate the cells, hybridization was 
done with 0.5ng/ml of a fluorescently labeled DNA probe 
in hybridization solution (10% formamide, 2x SSC, 
0.5mg/ml yeast tRNA, 10% dextran sulfate, 50|ig/ml 
BSA, and lOmM ribonucleoside vanadyl complexes) 
for 3h at 37°C. The following fluorochrome-labeled 
DNA probes were used: 5'-Cy3-labeled yl8Sn probe 
(5'-CCAAAGCCTGAATCCTCG-3'), 5'-FITC-labeled 
Ul-#1 probe (5'-GTATCTCCCCTGCCAGGTAAGTA 
T-3'), 5'-FITC-labeled Ul-#3 probe (5'-TATGCAGTCG 
AGTTTCCCACATTTGG-3'), 5'-FITC-labeled U1-#SL4 
probe (5'-GAAAGCGCGAACGCAGTCCCCCACTA- 
3'), 5'-Cy3-labeled U2 probe (5'-CTACACTTGATCTTA 
GCCAAAAGGCCGAGAAGC-3') and 5'-Cy3-labeled 



5TTS1 probe (5'-CCTCGCCCTCCGGGCTCCGTTAA 
TGATC-3'). When a mixture of Cy3- and FITC-labeled 
probes was used, the concentration of each probe was 
0.5ng/ml. After hybridization, the ceUs were washed 
twice with 2x SSC, 10% formamide and once with PBS. 
Immunocytochemistry was carried out as described in 
Supplementary information. The staining was observed 
with an Axiovert 200 M microscope. 

RESULTS 

MS-based analysis of SMN-associated RNAs 
identifies Ul-tfs 

We first expressed WT SMN from a single-copy transgene 
encoding a triple affinity-purification (DAP) tag (biotin 
and FLAG tags useful for visualization and purification 
and an additional N-terminal 6-histidine epitope tag; 
Figure lA) (38). The expressed DAP-SMN localized not 
only in the cytoplasm but also in the nucleus as foci, 
as reported for endogenous SMN (Supplementary 
Figure SIA). Pull-down using DAP-SMN as bait 
showed that DAP-SMN associated with Gemins 2, 3, 4, 
5, 6 and 8, Unrip, coilin, SmB/B', SmDl, SmE and UlA 
as reported (11,39,40) (Supplementary Figure SIB). 
Northern blotting showed that the DAP-SMN complexes 
contained, minimally, Ul, U2, U4, U5, U7 and Ull 
snRNAs, each of which is known to associate with 
SMN (12,39) (Figure IB), suggesting that DAP-SMN 
was functionally indistinguishable from endogenous 
SMN. 

SYBR Gold-stained RNA bands were excised, digested 
with RNase Tl and analyzed by nanoflow LC-MS/MS in 
combination with Ariadne, a genome-wide search engine 
for RNA identification (31). We also analyzed the pulled- 
down RNAs by the LC method, where RNAs were 
separated by reverse-phase semimicro-LC, fractionated, 
digested with RNase Tl and analyzed by nanoflow LC- 
MS/MS and Ariadne (Figure IC and Supplementary 
Tables SI and S2). These analyses identified, minimally, 
Ul, U2, several forms of U5, U6, U7 and Ull snRNAs 
(Figure IB and C and Supplementary Tables SI and S2). 
In addition to Ul snRNA, LC-MS/MS identified Ul-tfs 
in a stained band (~120nt) that migrated faster than 
mature Ul snRNA (Figure IB) and in the earher 
eluting fractions relative to mature Ul snRNA (Figure 
IC). On RNase Tl digestion, this Ul-tfs generated 21 
distinct ohgonucleotides originating from the region 
1-121 of mature Ul snRNA (Supplementary Table S2), 
but we did not detect ohgonucleotides from the 3' region 
122-164. To confirm the MS-based identification of Ul- 
tfs, we used four DNA probes complementary to regions 
1-24 (probe #1), 65-86 (#2), 100-124 (#3) and 125-147 
(#4) of mature Ul snRNA (Figure ID) and performed 
northern blotting to detect the RNAs extracted from 
DAP-SMN complexes or total cellular RNA. Ul-tfs 
were detected by probes #1, #2 and #3 but not #4 
(Figure ID); these results agreed with those of the MS- 
based identification. We also analyzed the nucleotide se- 
quences of RNAs extracted from the SYBR Gold-stained 
Ul-tfs band (Figure IB) by the 3' rapid amplification of 
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cDNA ends protocol, which confirmed that each 
sequence analyzed had a length of 1 18-127 nt starting 
from the 5' end of the mature Ul snRNA 
(Supplementary Table S3). These results clearly indicated 



that Ul-tfs lack the Sm-binding site and SL4 region. In 
addition, the results for total RNA indicated that Ul-tfs 
were not an artifact formed during the pull-down of 
DAP-SMN complexes (Figure ID). 
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Figure 1. Identification of Ul-tfs as a novel RNA component associated with SMN. (A) Schematic diagram of DAP-tagged SMN (DAP-SMN) used 
for pull-down analysis. SMN was fused at the N-terminal (N) with a 6-histidine tag and biotinylation sequence and at the C-terminal (C) with a 
FLAG tag connected by a TEV cleavage sequence. (B) DAP-SMN-associated RNAs were separated by denaturing PAGE (7.5 M urea, 9% poly- 
acrylamide). RNAs were visualized by SYBR Gold staining (left) or by northern blotting with probes complementary to RNA as indicated (right). 
Input is the cell extract used for pull-down analysis. DAP-SMN: extract of DAP-SMN-expressing T-REx 293 cells. T-REx: extract of control parent 
cells, PD: pull-down. (C) A typical chromatogram of standard RNAs (upper) or of the DAP-SMN-associated RNAs (bottom) separated by reverse- 
phase LC. Each fraction (1-7) was subjected to LC-MS/MS-Ariadne analysis. Sizes of the standard RNAs or the specific RNAs identified by MS are 
indicated. (D) Northern blot analysis for DAP-SMN-associated RNA or total RNA of HeLa cells. The analysis was done with the probes (#1, #2, #3 
and #4) corresponding to the regions that are shown under the sequence of Ul snRNA (top). The region corresponding to #SL4 probe is also given. 
DAP-SMN-associated RNA or total RNA was separated by denaturing urea-PAGE and stained with SYBR Gold or analyzed by northern blotting 
with probes #1-4. 
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Ul-tfs have a 7-monomethylguanosine 5' cap and 
a methylated first adenosine base and lack the 
methylation at position 70 

During the LC-MS/MS analyses, collision-induced dis- 
sociation generated a series of product ions from the 
RNA fragments. The product ions included the major 
c/y and a/w series with minor derivatives (dehydrated 
ions and ions that lost nucleotide bases; Supplementary 
Figure S2A and Supplementary Tables SI and S2), as 
reported (33). Among the oligonucleotides generated 
from the Ul snRNA fractions (Figure IC), we mainly 
detected m3GpppAmUmACUUACCUGp (or m3Gppp 
AmUmAC^vI/ACCUGp) (containing a 5' cap of 2,2,7- 
trimethylguanosine, m^G) that originated from the 
5' end (Figures 2A, Supplementary Figure S2B and C 
and Supplementary Table S2) and CUUUCCCCUG-OH 
from the 3' end (Supplementary Figure S2D and 
Supplementary Table S2), indicating that the ohgonucleo- 
tides mostly originated from mature Ul snRNA. We also 
detected CUUUCCCCUG>p (2', 3' cycHc phosphate) in a 
ratio of -1:6 with CUUUCCCCUG-OH (estimated using 
relative mass intensities; Supplementary Figure S2D and 
Supplementary Table S2). Because RNase Tl cleavage 
should generate CUUUCCCCUG>p, those SMN- 
associated RNAs probably contained at least >1 
extended bases at the 3' end. The MS-based method 
could distinguish a one-nucleotide extension as well as 



the absence of a phosphate or hydroxyl group at the 3' 
teiminus of the mature Ul snRNA. In addition, we 
detected methylation at position 70 in ~90% of the ohgo- 
nucleotides corresponding to the region 69-75 of mature 
Ul snRNA (CAmCUCCG>p; Supplementary Table S2, 
Figure 2B and Supplementary Figure S2E), as reported 
(41). On the other hand, one ohgonucleotide generated 
from the 5' end region of Ul-tfs had a mG cap 
(Figure 2A) and three methyl groups in the first two nu- 
cleotides (Supplementary Figure S2B and C); two of the 
latter reflected ribose 2' O-niethylation at the first and the 
second transcribed nucleotides, and the remaining methyl 
group was found in the adenine base of the first nucleotide 
(m^GpppmAmUmACp; Supplementary Table S2 and 
Supplementary Figure S2B and C). In addition, Ul-tfs 
produced almost exclusively an oligonucleotide without 
the methylation at position 70 (CACUCCG, 69-75; 
Supplementary Table S2, Figure 2B and Supplementary 
Figure S2E). Supplementary Figure S2F summarizes the 
secondary structure of Ul-tfs identified by MS-based 
analysis in comparison with mature Ul snRNA. 

Ul-tfs associate with proteins involved in early steps of 
Ul snRNP biogenesis in the cytoplasm 

Given that Ul-tfs have an m^G cap, we examined whether 
Ul-tfs associate specifically with proteins involved in 
early steps of Ul snRNP biogenesis. We used PHAX, 
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Figure 2. Assignment of post-transcriptional modifications of Ul-tfs using LC-MS/MS. (A) MS/MS spectrum of the 5'-terminal oligonucleotide 
derived from RNase Tl digest of Ul snRNA (upper left) or Ul-tfs (bottom left). Various MS/MS fragments originating from the m3G cap of Ul 
snRNA or from the m'G cap of Ul-tfs are indicated. Nomenclature for the product ions generated by collision-induced dissociation is shown in 
Suppleinentary Figure S2A. (B) Reverse-phase LC separation of the RNase Tl digest of Ul snRNA or Ul-tfs. Effluent was monitored as the count 
of total ions (top), m/r= 1099.14 (CACUCCG>p; middle) or 777/2 = 1 106.15 (CAmCUCCG>p; bottom). ACCCCUG>p is isobaric with 
CACUCCG>p and is present in both RNA digests. BPI chromatograin: chromatogram monitored by BPI [BPI: Base Peak Intensity; the most 
intense (highest signal) peak in a mass spectrum in the range of m/z values measured]. 



2714 Nucleic Acids Research, 2014, Vol. 42, No. 4 



CJ 

< O o 
^ T ^ 



<y * V/ <r 



Probe: #1 



#2 



l3 



-U1 
-U1-tfs 



Z3= 



U1 

U1-tfs 



#3 



#4 



Z3 



-U1 
-iUI-tfs 



-U1-tfs 



20S 40S 60S 
5 10% Glycerol 30% 



20S 40S 60S 
10% Glycerol 30% 



20S 40S 60S 
10% Glycerol 30% 



o 




-■U1 



IB: Gemin2 
IB: FLAG 
IB: GeminS 
IB: SMN 



Ul-tfs iQ. smB/B' 
^ U1 IB: PHAX 

Ul-tfs IB: CBP80 
IB: DCP1A 



Fraction 

1 |2 3| 4 [5~6] 7 |8 9| l0 J 1 |2 3| 4 [5~6] 7 |8 9| lO 
ABC 



Fraction 
ABC 



T-REx (Control) 



IB: Gemln2 
IB: Gemln3 
IB: Gemln4 
IB: FLAG 
IB: GeminS 
IB: GeminS 
IB: SMN 
IB: Unrip 
IB: PHAX 
IB: CBP80 
IB: PRMT5 
IB: DCP1A 



Ox 

(J ■ lU 
X .S ^ 

t I 



KOABCABC 



i 
/ 

o 

1 1 1 



f / / 

o; i I 

HOABCABC 




Met-tRNA 
7SK-5' 



1% U.C. Input 



1% U.C. Input IP: FLAG 

Figure 3. Ul-tfs are formed at early steps of Ul snRNP biogenesis. (A) RNAs were pulled down using DAP-SMN, FLAG-fused PHAX, U1-70K, 
snurportin or LUC7 as affinity bait. RNAs were separated by denaturing urea-PAGE and visualized by northern blotting with probes #1-4. 
(B) Total cell extract of T-REx 293 cells (left panel) or GeminS-HEF-expressing T-REx 293 cells (right panel) were separated into 10 fractions 
by glycerol gradient (10-30%) ultracentrifugation. Each fraction was analyzed by immunoblotting with antibodies against the proteins indicated. 
Northern blotting with DNA probe #1 or #3 showed a typical Ul RNA distribution pattern of this ultracentrifugation fractionation. Sedimentation 
values are indicated above the elution profile. (C) Fractions 2 and 3 (mixture A). 5 and 6 (mixture B) or 8 and 9 (mixture C) in (B) were mixed and 
subjected to pull-down analysis with Gemin5-HEF as affinity bait. Proteins and RNAs pulled down with Gemin5-HEF were analyzed by immuno- 
blotting (IB) with antibodies against the proteins indicated and by northern blotting with the DNA probes indicated, respectively. 



snurportin and LUC7 as affinity bait to pull down Ul-tfs 
and constructed doxycycline-inducible cell lines for their 
expression. PHAX associates with pre-snRNAs soon after 
their transcription, and the PHAX-pre-snRNA complex is 
exported with CBP20/80 and other proteins to the cyto- 
plasm (42). Snurportin recognizes the msG cap of Ul 
snRNA in the cytoplasm and is imported with importins 
into the nucleus. LUC7 is a human homolog of budding 
yeast Luc7p, which recognizes and binds the 5' splice site 
of pre-mRNA (43). Pull-down analyses showed that Ul- 
tfs associated with PHAX but not snurportin or LUC7 
(Figure 3A). This result was consistent with the fact that 
the Ul-tfs cap is only monomethylated and suggested that 



Ul-tfs associate with at least SMN and PHAX and are 
only present in early steps of snRNP biogenesis, i.e. before 
5' cap hypermethylation. 

To further analyze the SMN complexes associated 
with Ul-tfs, we used glycerol gradient ultracentrifugation 
to prepare 10 fractions of the total cell extract of T-REx 
293 cells or doxycychne-inducible GeminS-hemagglutinin/ 
TEV protease cleavage site/FLAG (HEF)-expressing 
T-REx 293 cells. Ul-tfs eluted throughout the fractions 
with two distinct maximum peaks at fractions 3 and 7, 
whereas SMN eluted in fractions 3-10 with a single peak 
at fraction 7 (Figure 3B). We mixed fractions 2 and 3 
(mixture A), 5 and 6 (mixture B) and 8 and 9 (mixture 
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C), pulled down the protein in each mixture using 
Gemin5-HEF as affinity bait and identified the pulled- 
down proteins by immunoblot analysis. This analysis 
showed that Gemin5-HEF associated with CBP80, 
PHAX, Gemin3 and Gemin4 as well as with mature Ul 
snRNA and Ul-tfs in mixture A (Figure 3C). However, 
SMN, Gemin2, GeminS and Unrip were not detected in 
this Geniin-HEF-associated complex (Figure 3C). On the 
other hand, in mixtures B and C, Gemin5-HEF associated 
with Geniin2, Gemin3, Gemin4, GeminS, Unrip and SMN 
as well as with mature Ul snRNA and Ul-tfs (Figure 3C). 
We further examined the association of Ul-tfs with 
Gemin2, Gemin6, SmB/B', SmDl or SmE by using 
its HEF-fused form expressed in the corresponding doxy- 
cycline-inducible cell line as the affinity bait and showed 
that all of those proteins were associated with Ul-tfs 
(Supplementary Figure S3). These results suggested that 
Ul-tfs formed at least two distinct complexes — one con- 
taining Gemin3, Geniin4, PHAX and Gemin5 but lacking 
Gemin2, GeminS, Unrip, and SMN, and the other con- 
taining many known components of SMN complexes 
including Gemin2, GeminS, Unrip and SMN. Thus, 
Ul-tfs could be formed at early stages of Ul snRNP bio- 
genesis at least before the association with SMN. 

Ul-tfs are formed from transcripts of the WT Ul gene 
construct and that lacking the 3' box element or having 
defects in the SL4 region 

Given the fact that there are a number of bona fide loci for 
Ul snRNA gene and pseudogene in human genome, 
we examined a possibihty that Ul-tfs arose from some 
of those bona fide loci. The human genome has three Ul 
genes containing a 164-base Ul-coding region (SLl, SL2, 
SL3, Sm, SL4) and three cis-acting elements — distal 
sequence element (DSE), proximal sequence element 
(PSE) and 3'box (35,44). We used vector pcDNA3.1 
to construct various Ul snRNA genes, namely, WT, 
ASmSL4, ADSE, APSE and A3'box (Supplementary 
Figure S4A and B). WT vector contained all of the 
elements and the region containing the 3S0-nt sequences 
downstream of the 3'box reported in the Ul gene (ID 
26S71) and was used as a control for the full gene express- 
ing mature Ul snRNA (Supplementary Figure S4A 
and B). ASmSL4 lacked the Sm-binding site and SL4 
region and was used to express Ul-tfs. ADSE, APSE 
and A3'box lacked the DSE, PSE and 3'box region, re- 
spectively (Supplementary Figure S4B). We transiently 
transfected each vector into 293 EBNA cells [human em- 
bryonic kidney cell line that stably express Epstein Barr 
Virus (EBV) EBNA-1 gene from pCMV/EBNA] and 
detected Ul-tfs in total cellular RNA by northern 
blotting with probes #1 and #3. Ul-tfs level increased sig- 
nificantly on transfection with ASmSL4 or A3'box but not 
with WT, ADSE or APSE (Supplementary Figure S4C). 
We also constructed vectors containing Ul genes fused 
with an RNA tag that distinguishes exogenously expressed 
Ul snRNA from endogenous Ul (Figure 4A). For the 
RNA tag, we used ylSSn (yeast ISS neutral), which does 
not form a stable secondary structure and is expected not 
to inhibit normal RNA activity (45), or RAT (RNA 



Affinity in Tandem), which was previously used to 
affinity purify 7SK RNP (36). We first constructed an 
additional six expression vectors (yl8Sn-WT, 199 nt; 
ylSSn-ASmSL4, 160-nt; ylSSn-ADSE, 199 nt; ylSSn- 
APSE, 199 nt; ylSSn-A3'box, 199 nt; ylSSn-ASm, 
191 nt), each of which had an extra ATACTTACCTG 
sequence and an ylSSn at the 5' end of the Ul coding 
region (Figure 4A). The ATACTTACCTG sequence cor- 
responds to the first 11 nucleotides from the 5' end of Ul 
snRNA, and its presence at the 5' terminus is required for 
base pairing with a canonical 5' splice site (35). Transient 
expression of those vectors and northern blotting with a 
probe complementary to ylSSn confirmed the results 
obtained for expression of WT, ASmSL4, ADSE, APSE 
or A3'box; namely, absence of the 3'box increased the 
level of Ul-tfs (Figure 4B). These analyses showed that 
ylSSn-WT also formed Ul-tfs (Figure 4B). In addition, 
these analyses revealed that the ylSSn-ADSE produced a 
full-length mature Ul snRNA but not Ul-tfs, whereas 
ylSSn-APSE did not produce a detectable transcript 
(Figure 4B). Interestingly, ylSSn-ASni expression 
produced mostly the transcript of expected size and 
produced Ul-tfs at a ratio similar to that of ylSSn-WT 
(Figure 4B). As shown in Supplementary Figure S4D, we 
obtained similar results using RAT-tagged expression 
vectors, indicating that the formation of Ul-tfs did not 
depend on the tag used. Overall these results suggested 
that both WT and A 3'box Ul genes contribute to the 
formation of Ul-tfs, whereas A 3'box Ul gene appears 
to produce Ul-tfs more efficiently than WT gene. 

We next examined a possibility that Ul-tfs arose from 
some structural defects in Ul snRNA transcripts. We 
constructed six other variant expression vectors (ylSSn- 
ASL4-1, ylSSn-ASL4-2, ylSSn-ASL4-3, ylSSn-ASL4- 
lA3'box, ylSSn-ASL4-2A3'box, ylSSn-ASL4-3A3'box; 
Figure 4C). ylSSn-ASL4-l lacked the SL4 region and 
was expected to produce a 172-nt yl8Sn-Ul snRNA, 
whereas ylSSn-ASL4-2 (lS5nt) and ylSSn-ASL4-3 
(lS6nt) lacked the first and last half of SL4, respectively. 
The other three constructs, ylSSn-ASL4-l A3'box, ylSSn- 
ASL4-2A3'box and yl8Sn-ASL4-3A3'box, lacked the 
3'box region of yl8Sn-ASL4-l, ylSSn-ASL4-2 and 
ylSSn-ASL4-3, respectively. Despite the different sizes of 
the expected transcripts, transient expression of these six 
vectors almost exclusively yielded Ul-tfs (Figure 4D). As 
an exception, yl8Sn-ASL4-l expression produced Ul-tfs 
as well as an RNA species having the size of U 1 snRNA 
lacking SL4; however, this RNA species was not observed 
on expression of ylSSn-ASL4-l A3'box (Figure 4D). Use 
of the RAT-tagged vectors yielded results similar to those 
obtained for the ylSSn-tag constructs; e.g. defects in the 
SL4 region primarily yielded Ul-tfs (Supplementary 
Figure S4E). Thus, the formation of Ul-tfs is related to 
the transcriptional events of the gene that lacks the 3'box 
or has deficiency in the SL4 region. 

The Ul-tfs snRNA locaUzes primarily to P-bodies 

We used FISH to examine the localization of Ul-tfs in cells. 
We transiently expressed yl8Sn-WT, yl8Sn-ASmSL4 
or yl8Sn-ASL4-l in 293T cells and detected yl8Sn-WT 
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Figure 4. Formation of Ul-tfs depends on transcription from the Ul gene. (A) Schematic diagram of the Ul gene construct expressing RNA-tagged 
(yl8Sn or RAT)-U1 snRNA or its deleted forms. See the text for the explanation of each construct (RNA tag-WT, yl8Sn-WT or RAT-WT; RNA 
tag-ASmSL4, yl8Sn-ASmSL4 or RAT-ASmSL4: RNA tag-APSE, yl8Sn-APSE or RAT-APSE; RNA tag-ADSE, yl8Sn-ADSE or RAT-ADSE; 
RNA tag-A3' box, yl8Sn-A3' box or RAT-A3' box; and RNA tag-ASm, yl8Sn-ASm or RAT-ASm). (B) Total RNA was extracted from 293 EBNA 
cells transiently transfected with an expression vector encoding one of the yl8Sn-tagged constructs. RNAs were analyzed by northern blotting with a 
probe complementary to the yl8Sn sequence or with probe #1. An expression vector encoding RAT-7SK was used as a transfection control. (C) 
Schematic diagram of an RNA-tagged (yl8Sn or RAT)-U1 gene construct (with or without 3' box) having a defect in the SL4 region. Each construct 
is explained in the text. (D) Total RNA extracted from 293T cells transiently transfected with an expression vector composed of one of the yl8Sn- 
tagged constructs was analyzed by northern blotting with the yl8Sn probe or probe #1. 



or yl8Sn-Ul-tfs using the Cy3-labeled probe for yl8Sn tag 
(Cy3-yl8Sn). Endogenous Ul snRNA and exogenously 
expressed Ul snRNA were also detected using the FITC- 
labeled probe for region SLl, SL3 or SL4 (FITC-#1, 
FlTC-#3 or F1TC-#SL4) corresponding to the sequence 
of probe #1 or #3 used for northern blotting or #SL4 



shown in Figure ID. FISH using a sequence complemen- 
tary to yl8Sn or Ul snRNA (probe #1) indicated that 
yl8Sn-WT localized almost exclusively in the nucleoplasm 
(Figure 5A), suggesting that the y 18Sn-tag did not interfere 
with Ul snRNA localization. Cy3-yl8Sn staining resulted 
in a number of cytoplasmic dots in the yl8Sn-ASmSL4- 
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Figure 5. Ul-tfs localize in P-bodies. (A) The 293T cells transfected with an expression vector encoding yl8Sn-WT, yl8Sn-ASmSL4, yl8Sn-A3'box 
or yl8Sn-ASm. or yl8Sn-ASL4-l, were subjected to FISH. Endogenous Ul snRNA, exogenous Ul snRNA and Ul-tfs were detected with probe #1 
labeled with FITC (green). Exogenously expressed Ul snRNA or Ul-tfs were also detected with the Cy3-labeled yl8Sn probe (red). DAPI staining 
shows the nucleus. Merge: FITC, Cy3 and DAPI staining are merged, Scale bar: 10|rm. (B) yl8Sn-WT-, yl8Sn-ASm-, yl8Sn-ASL4-l-, yl8Sn- 
A3'box- or yl8Sn-UlA3ASmSL4-expressing cells were stained by immunocytochemistry with antibodies against the proteins (green) indicated and by 
FISH with the Cy3-labeled yl8Sn probe (red). (C) Proteins were pulled down (PD) from extract of RAT-WT- or RAT-ASL4-l-expressing cells or 
from control cells (co-transfected with vectors pcDNA3.1-PP7CP-HF and pcDNA3.1) by RAT-based affinity purification. RAT-tagged RNAs were 
detected by northern blotting with the RAT probe (left). PD: RAT, RAT-tagged RNA-protein complex bound to FLAG-tagged PP7CP was pulled 
down with anti-FLAG-conjugated beads and elated with FLAG peptide. Proteins were visualized by immunoblotting (IB) with antibodies against the 
proteins indicated (right). 
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or yl8Sn-ASL4-l-expressing cells, but fewer or no 
dot-stainings were observed in yl8Sn-WT-expressing 
cells (Figure 5 A). In yl8Sn-A3'box-expressing cells, 
few cytoplasmic dots were observed (Figure 5A and B). 
In yl8Sn-ASmSL4- or yl8Sn-ASL4-l -expressing cells, 
use of FITC-#1 or FITC-#3 yielded co-staining with 
Cy3-yl8Sn in the cytoplasmic dots (Figure 5A and 
Supplementary Figure S5A). Those probes also showed 
nucleoplasmic staining (Figure 5 and Supplementary 
Figure S5A). On the other hand, FITC-#SL4 could not 
stain the Cy3-yl8Sn-positive dots in yl8Sn-ASL4-l- 
expressing cells (Supplementary Figure S5A), indicating 
that the molecular species lacking the SL4 region localized 
to the dots. In addition, these Cy3-yl8Sn-stained dots co- 
localized with those stained with an antibody against 
SMN (Supplementary Figure S5B), Gemin2, GeminS, 
Gemin6, Gemin8 (Supplementary Figure S5C) or 
DCPIA, the latter of which is a marker for P-bodies, in 
yl8Sn-ASL4-l -expressing cells (Figure 5B). However, 
SmB/B' was not co-localized with the Cy3-yl8Sn-stained 
dots in those cells (Figure 5B). On expression of yl8Sn-WT, 
most of yl8Sn-Ul snRNA was detected in the nucleus 
and its localization pattern was comparable to that of 
endogenous Ul. 

Using RAT-tagged RNAs as affinity bait, 
RNA-associated proteins were pulled down with anti- 
FLAG-conjugated beads from cell extracts of RAT- 
WT-, RAT-ASL4-1-, RAT-A3'box-, RAT-ASL4-2- or 
RAT-ASL4-2A3'box-expressing cells co-expressing 
FLAG-tagged PP7CP and detected by immunoblotting. 
RAT-WT-expressing cells produced mostly RAT-Ul 
snRNA (Figure 5C), which associated with all the compo- 
nents of SMN complexes we examined (Figure 5C). 
We vahdated the specificity of RAT-tagged RNA by 
using RAT-7SK, which associated with La/SSB as 
reported (Figure 5C) (36). On the other hand, RAT- 
ASL4-1 -expressing cells produced mainly RAT-Ul -tfs 
(Figure 5C). The pull-down showed that RAT-Ul -tfs 
associated with all the proteins that bind RAT-Ul 
snRNA except SmB/B', SmDl, and UlA (Figure 5C). 
Similar results were obtained by using RAT-ASL4-2 and 
RAT-ASL4-2A3'box-expressing cells (Supplementary 
Figure S6A). RAT-A3'box-expressing cells produced not 
only RAT-Ul snRNA but also Ul-tfs; accordingly, SmB/ 
B' was pulled down less in comparison with the other 
proteins pulled down by RAT-Ul snRNA or RAT-Ul- 
tfs (Supplementary Figure S6A). These results indicated 
that Ul-tfs forms a complex with most of the known 
components of SMN complexes except SmB/B', SmDl 
and UlA and suggested that Ul-tfs are transported to 
P-bodies with those proteins. 

To validate RNA-tagged RNA species in terms of 
post-transcriptional modifications, we analyzed the 
pulled-down RAT-Ul snRNA (produced from RAT- 
WT) and RAT-Ul-tfs (produced from RAT-ASL4-1 or 
RAT-ASL4-lA3'box) using the LC-MS/MS-Ariadne 
method after RNase Tl digestion of their corresponding 
SYBR Gold-stained RNA bands excised from urea- 
PAGE gels. RAT-WT produced Ul having the m^G cap 
(~99% of the Ul population) and methylation at position 
70 (-40% of the population), but RAT-Ul-tfs had the 



m^G cap exclusively and had base methylation at the 
first adenine in ~50% of its population (Supplementary 
Figure S6B and Supplementary Table S4). Thus, RAT- 
tagged transcripts underwent post-transcriptional modifi- 
cations in a manner similar to that observed for the cor- 
responding endogenous transcripts. 

Ul-tfs are degraded more rapidly than mature-Ul snRNA 

Given the report that P-bodies function in RNA surveil- 
lance and turnover (30), we compared the degradation rate 
of Ul-tfs with that of mature Ul snRNA. We transfected 
yl8Sn-WT, yl8Sn-ASmSL4 or yl8Sn-A3' box construct in 
293T cells, treated the cells with actinomycin D and 
measured the time-dependent changes in the cellular 
levels of Ul snRNA and Ul-tfs in the total RNA. As 
clearly shown in Figure 6 A, the level of Ul-tfs decreased 
much more rapidly than that of mature Ul snRNA, sug- 
gesting that Ul-tfs are degraded efficiently in P-bodies. 
Given that Ul-tfs co-localized with SMN and its associated 
components in P-bodies, we next addressed whether SMN 
plays an active role in this degradation pathway, such that 
SMN participates in P-body localization and degradation 
of Ul-tfs. To examine this, we took advantage of previ- 
ously characterized mutations in the SLl region of Ul 
snRNA (U1A3) that abolish Ul binding to SMN, as 
reported by Yong et at. (11). We constructed the yl8Sn- 
UlA3ASmSL4 vector, expressed the construct in 293T 
cells and detected yl8Sn-tagged RNA by northern blot or 
in .situ hybridization analysis. If SMN participates in the P- 
body localization and degradation of Ul-tfs, this transcript 
will not be localized into P-bodies and will not be degraded 
efficiently. Those analyses, however, detected only a minute 
amount of the truncated form of yl8Sn-UlA3ASmSL4 
(Figure 6B) but showed dominant P-body localization of 
yl8Sn-UlA3ASmSL4 (Figure 5B). These results suggested 
that SMN did not have a direct role in the P-body localiza- 
tion of Ul-tfs, although we could not evaluate the role of 
SMN in the degradation rate of Ul-tfs because of its ex- 
tremely low expression level in the cells. 

SmB/B' deficiency or deletion of the Sm-binding site 
induces P-body localization of Ul snRNP 

Given our results that Ul-tfs are localized in P-bodies but 
do not associate with SmB/B' or SmDl, we considered the 
possibility that P-body localization of Ul-tfs is induced by 
failure of the assembly of Sm proteins on the Sm site. 
We, therefore, postulated that SmB/B' deficiency would 
cause full-length Ul snRNA to localize to P-bodies. 
Knockdown of SmB/B' with stealth RNA (siRNAl; si-1) 
increased the number of P-bodies in the cytoplasm 
compared with a control RNA (scrambled sequence), as 
detected with an antibody against DEAD (Asp-Glu-Ala- 
Asp) box protein 6 (DDX6), Sm-like protein (LSM)14A, 
GeminS, or DCPIA, which is a marker for P-bodies 
(Figure 7A-C). SmB/B' knockdown reduced the level of 
Ul snRNA (Supplementary Figure S7A) and induced 
P-body localization of not only endogenous Ul snRNA 
but also SMN and GeminS (Figure 7A-C). Use of another 
siRNA (siRNA2; si-2 that has a different sequence 
from si-1) for SmB/B' knockdown also showed P-body 
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Figure 6. Ul-tfs are degraded more rapidly than mature-Ul snRNA is. (A) RNAs were prepared from yl8Sn-WT-, yl8Sn-ASmSL4- or yl8SnA3' 
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localization of Ul snRNA (Supplementary Figure S7B). 
We also observed that the knockdown with si-1 and si-2 
reduced the level of U2 snRNA and induced P-body lo- 
calization of U2 snRNA, respectively (Supplementary 
Figure S7A, SVC and S7D). SmB/B'-deficient cells also 
showed reduced staining for FITC-#1 and Cy3-U2 
(Figure 7B and Supplementary Figure S7C and D), con- 
sistent with reduced SYBR Gold staining for Ul and U2 
snRNA in total RNA (Supplementary Figure S7A). 
Furthermore, ylSSnASm RNA lacking the Sm-binding 
site but not the SL4 region, co-localized with SMN in 



P-bodies (Supplementary Figure S5B). However, the 
siRNA-mediated knockdown of SMN did not cause P- 
body localization of yl8Sn-WT that was co-transfected 
with the siRNA (Supplementary Figure S7E). These 
results strongly suggested that the failure of Sm protein 
loading to Ul snRNA-induced localization of the 
Ul snRNA-SMN complex to P-bodies and accelerated 
the degradation of Ul snRNA (Figure 7A-C and 
Supplementary Figure S7A), whereas SMN did not have 
a direct role in the P-body localization of U 1 snRNA and 
Ul-tfs (Supplementary Figure S7E). 
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Figure 7. Failure of SmB/B' loading increases the number of P-bodies and causes P-body localization of Ul snRNA. (A) 293T cells transfected 
transiently with a stealth SmB/B' siRNA (si-1) or scrambled RNA (sc) were analyzed by immunocytochemistry using anti-SmB/B' primary antibody 
and Cy3-labeled secondary antibody (red). The other proteins were detected with their corresponding antibodies and FITC-labeled secondary 
antibody (green). DAPI staining shows the nucleus. Merge: FITC, Cy3 and DAPI staining are merged. Scale bar: 10|.im. (B) SmB/B' was 
detected with anti-SniB/B' primary antibody and Cy3-labeled secondary antibody (red). GeminS was detected with its antibody (green). Ul 
snRNA was detected by FISH with probe #1 (green). (C) Cells were analyzed by combination of FISH with probe #1 (green) and immunocyto- 
chemistry with the antibodies indicated (red). 



DISCUSSION 

In this study, we demonstrated that Ul-tfs were formed at 
steps before 5' cap hypermethylation, were diverted from 
the canonical pathway of Ul snRNP biogenesis owing to 
the failure of Sm protein loading onto Ul-tfs lacking the 
Sm-binding site, and were destined for localization in 
P-bodies where surveillance and degradation of RNA 
take place (Figure 8). We uncovered at least an additional 
pathway to the degradation of aberrant snRNAs in add- 
ition to a known quality control step in snRNP biogenesis. 



On retrieval of sequences for the Ul genes or related 
genes from the genome database reference assembly 
(version GRCh37), we localized 41 loci for these genes 
in the human genome (Supplementary Table S5). Of 
these, 3 loci contain the Ul snRNA coding region and 
all 3 cis-acting elements (DSE, PSE and 3' box) (fuU Ul 
genes), 7 loci contain DSE and PSE but no 3' box (A3' 
box-Ul genes), 4 loci contain only DSE (DSE-Ul genes), 
5 loci contain only PSE (PSE-U 1 genes) and the remaining 
22 loci have the Ul coding region but lack the cw-acting 
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elements (Ul pseudogenes). Our present results indicated 
that not only full-length Ul genes but also A3' box-Ul 
and PSE-Ul genes could be transcribed and produce 
mature Ul snRNA, suggesting that full-length Ul genes 
and A3' box-Ul genes contribute to the formation of 
Ul-tfs. Given that the 3' box element is required for ap- 
propriate processing with the integrator complex at the 
3' end of pre-Ul snRNA (46), we propose that failure of 
3' end processing or inappropriate processing leads to the 
formation of Ul-tfs. When transcription reaches the 
3' box cis-acting element, the integrator complex cleaves 
pre-Ul snRNA at the proper 3' site. Cleaved pre-Ul 
snRNA is then trimmed by a nuclear 3'-5' exonuclease. 
Failure or inefficiency in cleavage of pre-Ul snRNA by 
the integrator complex during transcription of A3' box-Ul 
genes and less frequently during that of full-length Ul 
genes may result in a pre-Ul snRNA longer than that 



transcribed from full-length Ul genes, which may 
perturb the secondary structure of the 3' end region of 
the pre-Ul snRNA. In addition, defects in the SL4 
region of Ul genes may result in the formation of 
Ul-tfs, indicating the importance of the SL4 region of 
pre-Ul snRNA for appropriate formation of mature Ul 
snRNA; therefore, we also suggest that mutation in or 
aberrant transcription of the SL4 region leads to the 
formation of Ul-tfs. 

The analyses of GeminS-associated complexes prepared 
from the fractions separated by ultracentrifugation sug- 
gested that PHAX- and GeminS-associated Ul-tfs were 
formed first and then were combined with the SMN 
complex composed of Gemins 2, 8 and unrip (Figure 3B 
and C). PuU-down analysis using RAT-Ul-tfs (Figure 5C) 
also showed that the Ul-tfs-SMN complex contained 
almost all the components of the known SMN complex 
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except SmB/B', SmDl and UlA. Failure to incorporate 
Sm proteins into the Sm site perhaps resuhs in the inabihty 
to displace the N-terminal region of Gemin2, and there- 
fore the heptameric Sm ring cannot be fully formed. 
The Ul-tfs-SMN complex localized almost exclusively 
in P-bodies (Figure 5A and B and Supplementary 
Figure S5B), an aspect distinct from the normal 
Ul snRNA-SMN complex (Figure 5A and B and 
Supplementary Figure S5B); thus, the complex was 
discriminated from the normal Ul snRNA-SMN 
complex in terms of cellular localization. Use of yl8Sn- 
UlA3ASmSL4 mutant that lacked the abihty to bind to 
SMN, however, suggested that SMN had no active role in 
this discrimination (Figure 5B). 

Because the Ul-tfs-SMN complex lacks at least SmB/B' 
and SmDl, we hypothesized that this discrimination is a 
consequence of the failure of Sm protein loading into the 
Sm pentamer's RNA-binding pocket. This hypothesis was 
substantiated by the result that SmB/B' knockdown or 
the expression of a truncated Ul snRNA lacking the 
Sm-binding site (yl8Sn-ASm) increased the number of 
P-bodies (Figure 5B); moreover, SmB/B' knockdown 
induced P-body localization of endogenous Ul snRNA 
(Figure 7B). Based on these data coupled with the result 
that Ul-tfs were degraded more rapidly than was mature- 
Ul snRNA (Figure 6A), we conclude that a novel surveil- 
lance pathway for Ul snRNA exists in human cells; 
namely, inappropriate termination or defects in the SL4 
region of Ul snRNA cause processing to Ul-tfs lacking 
the Sm site, which is then eliminated through P-bodies 
(Figure 8). The results that SniB/B' knockdown 
reduced the cellular level of Ul snRNA (Figure 7B and 
Supplementary Figure S7A) also support this conclusion. 

We also observed that SmB/B' knockdown reduced the 
cellular level of U2 snRNA (Supplementary Figure S7A). 
Similar results were reported by Saltzman et al. (47), in 
which depletion of SmB/B' in human ceUs resulted in 
reduced levels of Sm-class snRNAs (Ul, U4, U5, Ull, 
U12 and U4atac) but not LSm-class snRNAs (U6 and 
U6atac), and similar reduction of U2 snRNA was 
observed on knockdown of SmDl (47). These reports as 
weU as our present results suggest that a general snRNA 
surveiUance mechanism may facihtate the elimination of 
inappropriate Sm-class snRNAs via the recognition of 
truncated forms of those snRNAs in SMN complexes. It 
wiU thus be intriguing to investigate whether short forms 
of other Sm-class snRNAs exist and whether they are 
eliminated via a similar pathway. 

Our present study raises many questions, one of which 
concerns how Ul-tfs are formed. Are Ul-tfs formed in the 
nucleus or after transport to the cytoplasm? Which ribo- 
nuclease is involved? What is the mechanism by which 
Ul-tfs transcripts are distinguished from normal Ul tran- 
scripts? Do differences in post-transcriptional modifica- 
tions at positions 1 and 70 between Ul-tfs and mature 
Ul snRNA contribute to this discrimination mechanism? 
Addressing these issues wiU require a fuller understanding 
of the surveillance pathway of U snRNAs. 

Recently, apart from its roles in splicing, Ul snRNP 
was reported to regulate transcript length through 
co-transcriptional recognition of cryptic polyadenylation 



signals and inhibition of premature cleavage and 
polyadenylation at these sites (48,49). Thus, the cellular 
level of Ul snRNP relative to the level of general tran- 
scription may determine the transcriptome and proteome 
of specific cell types. Our present results raise the add- 
itional possibility that the level of U 1 snRNP is regulated 
by cellular levels of proteins, such as SmB/B', for which an 
absence invokes the surveillance pathway we have 
identified. 

Our present study provides a good example of the bene- 
ficial use of MS for discovering a new degradation 
pathway during Ul snRNP biogenesis. In general, RNPs 
function in many ceUular processes. Elucidating the 
cooperative actions between RNAs and proteins is im- 
portant for understanding biological systems. Post- 
transcriptional or metabolic modifications of RNA play 
vital roles in cooperative actions with proteins. Direct 
analysis of RNA using MS allows unbiased identification 
of RNA and has great advantages in high-throughput 
analysis to acquire information about RNA modifications 
(31-34). In many diseases, especially neurological diseases, 
RNA metabohsm is altered, and thus the methods for 
direct analysis of RNAs described in our present study 
may be useful for understanding the pathogenesis of 
those diseases. 
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